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Abstract. We study the large deviations behavior of systems that admit a certain 
form of a product distribution, which is frequently encountered both in Physics 
and in various information system models. First, to fix ideas, we demonstrate 
a simple calculation of the large deviations rate function for a single constraint 
(event). Under certain conditions, the behavior of this function is shown to exhibit 
an analogue of Bose-Einstein condensation (BEC). More interestingly, we also study 
the large deviations rate function associated with two constraints (and the extension 
to any number of constraints is conceptually straightforward). The phase diagram 
of this rate function is shown to exhibit as many as seven phases, and it suggests 
a two-dimensional generalization of the notion of BEC (or more generally, a multi- 
dimensional BEC). While the results arc illustrated for a simple model, the underlying 
principles are actually rather general. We also discuss several applications and 
implications pertaining to information system models. 
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1. Introduction 

While the theory of statistical physics is traditionally concerned with typical or almost 
typical events, the closely related theory of large deviations deals with rare events 
whose probabilities are exponentially small in the size of the system. More precisely, 
large deviations theory is concerned with the exponential decay rate of probabilities of 
certain rare events, as the number of observations grows without bound. In statistical 
mechanics, there has always been some interest in the statistics of rare events (the 
Kramers escape problem being one example [1]). More recently, the interest in rare 
events has grown due to several applications. For example, in many cases it is important 
to know the probability of an extinction event in non-equilibrium models of epidemics 
[21 [3]. Another example is the measurement of fluctuation theorems [HE], such as the 
Jarzynski equality, which rely on probing rare events. Interest in rare events has also 
emerged recently in the statistics of records and other stochastic processes [SI El IE]- 
Finally, the calculation of large deviations is a natural framework within which one can 
define non-equilibrium free energy analogues [HI [101 [HI [12] . 

In this paper, we consider large deviations pertaining to product measures. In 
particular, we focus on the probability that some quantity of relevance would exceed a 
certain threshold. The paper contains two parts. In the first, we give a discussion with 
a tutorial flavor, which focuses on the calculation of the probability of a simple single 
large deviations events. Our aim, in this part, is to point out, for a non-expert reader, 
two important aspects: The first is the relation between large deviations theory and 
conventional statistical physics, and the second is the fact that phase transitions can 
be observed in the large deviations regime even in simple systems with no interactions, 
where phase transitions are not expected in the usual regime, of analyzing the typical 
behavior of the system. In particular, for product distributions of a certain form, a direct 
analogue of Bose-Einstein condensation (BEC) can be observed. Following this tutorial 
part, we turn to the second part of the paper, where we present results that extend the 
calculations of a single event to accommodate two simultaneous events, and the further 
extension to any finite and fixed number of events is then conceptually obvious. We 
show that even when the two events are physically closely related, the phase diagram 
can exhibit as many as seven different phases. This means that the large deviations 
point-of-view actually suggests a multi-dimensional extension of the notion of BEC. 
Furthermore, we compare phase diagrams of large deviations rate functions pertaining 
to inequality events to those of equality events and it turns out that these two phase 
diagrams are very different. 

To fix ideas, we first illustrate the results for a simple hopping model (closely related 
to the models studied in [T3l [Til [T5]). but they remain valid for fairly general forms of 
product distributions, and as such, apply to many physical systems and information 
system models. Examples of these range from black-body radiation (for a related 
calculation of large deviations in ideal quantum gases, see [16J), zero-range processes 
(in and out of equilibrium) [17], Jackson networks, which emerge in queuing theory 
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(and which are essentially analogous to zero range processes, but with no conservation 
of the particle number) [13 [19], driven-diffusive systems [20], and many others. These 
product distributions also arise in additional engineering applications. For example, this 
is the natural distribution for a one-way Markov chain, which is defined by an ordered 
set of states, where the only allowed transitions from each state are the self-transition 
and a transition to the next state. One-way Markov processes are commonly used 
in statistical modeling for a wide spectrum of application areas, including information 
theory, communications and signal processing (see Section 5 for details). 

The outline of the remaining part of this paper is as follows: In Section 2, we 
illustrate our results, without the detailed derivation, using a simple one-dimensional 
hopping model, which may describe transport in a disordered medium. In Section 3, 
we derive general results for the large deviations rate function of a single constraint. In 
Section 4, we extend the derivation to incorporate two constraints, and then display the 
corresponding phase diagram, finally, in Section 5, we discuss several applications to 
information system models. 

2. Informal Illustration of the Results 

Throughout this paper, we consider systems whose steady-state behavior admits a 
probability distribution of the product form 

P(K}) = |lK% (!) 

i 

where rij is the number of "particles" in "lattice site" % of the system and Z is a 
normalization constant. This means that {nj} are independent geometric random 
variables with parameters {pi}- The immediate relevance of this model is the 
distribution of the occupation numbers {ni} of the various energy levels in the grand 
canonical ensemble of an ideal boson gas, where Pi = ze -13 ^, z being the fugacity, j3 - 
the inverse temperature, and {e^} are the corresponding energy levels. Other natural 
applications of this model, which were mentioned briefly in the Introduction, will be 
reviewed in detail in Section 5. One can easily generalize our results to the case where 
each factor in the product of ([I]) is p"V n i- For the sake of simplicity, however, we confine 
ourselves throughout to the form ([1]), for which 6 = 0. For concreteness and intuition, 
we focus on a particularly simple dynamical model with such a steady-state distribution 
(for related models, see [131 US]). The model is defined on a one-dimensional lattice, 
with M sites, labeled by i — 0, 1, . . . , M — 1. A configuration of the system is defined 
by the number of particles n« = 0, 1, . . . , M — 1 at each site. The evolution is governed 
by random sequential dynamics defined by the following rules: Particles enter into the 
system via site i — at rate a (there is no exclusion between the particles). If at site i 
rti > 0, a particle is transferred from site i to site i + 1 at rate fa. At site i — M — 1, 
particles leave the lattice at rate faxt-i. The model, as illustrated in Fig. [IJ is therefore 
non-conserving only at the edges of the system. It can be considered as a simple model 
for transport in a disordered medium or, more pictorially, as a model of customers being 
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Figure 1. An illustration of the hopping model. Particles enter into the system from 
the loft at rate a. If site i is occupied, a particle is transferred to site i + 1 at rate /Zj. 
Particles leave the system at rate /x_m_i on the right-hand side. For a closely related 
model see, for example, |13j . 



served along a sequence of M consecutive queues, from left to right. In the latter case, 
each site represents a server. In the realm of queuing network theory, this model is a 
specific example of a Jackson network [18], [19] , and in steady-state, it admits a product 
of distributions of geometric random variables, provided that the rate at which particles 
flow into the system is small enough, so that the system does not overflow, namely, in 
this case, a < minj{/Zj}. Specifically, the steady-state probability of a configuration 
(n , . . . , njvf-i), in this example, is given by 

1 ( a \ ni 

P(n , ni, . . . , n M _i) = — JJ I — J , (2) 

which falls in the framework of (pQ) with p$ = a /fa. 

Our interest is in calculating the probability of a certain large-deviations (rare) 
event X. A simple example of such an event, customarily considered in large deviations 
theory, is that the total number of particles in the lattice exceeds some threshold, that 
is, X = {(no, ni, . . . , um-i) '■ X^=o 1 n i — N}. Consider the thermodynamic limit 
where both N and M grow without bound, such that their ratio N/M = U is kept 
fixed. If U exceeds a minimum value, given by its average value, and which we shall 
denote £/ m i n , this event becomes asymptotically rare, and its probability, Pr{X} decays 
with M asymptotically exponentially at the same as exp[— M ■ J(U)], where J(U) is 
referred to as the large deviations rate function in large deviations theory. Our interest 
will therefore be primarily in the evaluation of J(U). Another relevant question would 
be about characterizing those configurations of the system that dominate J(U). In other 
words, given that the event X has occurred, what are the system configurations that 
one is likely to observe? 

In general, J(U) may not be a smooth function. It may exhibit singularities (e.g., 
discontinuities in the derivatives of </(•)) & t some value U = U c (perhaps even at more 
than one such value). In the sequel, these will be referred to as phase transitions 
of the large deviations rate function. These phase transitions may be manifested not 
merely in possible singularities of the function J, but more interestingly, in condensation 
phenomena pertaining to the dominant configurations of the large deviations event in 
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question. For example, under certain conditions on the asymptotic behavior of rates 
{/ij} (analogous to conditions on the density of states in BEC), for U > U c , the dominant 
configurations become condensed: Although the total number of sites M grows without 
bound, a macroscopic fraction of the particles reside only in one of them. Loosely 
speaking, the particles are essentially jammed at the site (or server) with the slowest 
exit rate [20]. The value U c is analogous to the critical density in the ordinary BEC 
transition. For large deviations events of the type J2i n % — N, the rate function exhibits 
an additional phase transition: When U < £/ m m, the event in question is no longer rare 
and so J(U) = 0. This is a direct result of looking at an event defined by an inequality 
constraint rather than an equality constraint. If one considers instead a constraint of 
the form J2i n i — N, one would find two phases only, a condensed phase (U > U c ) and 
a non-condensed phase (U < U c ), and not three. In the sequel, we will elaborate more 
on this difference between equality constraints and inequality constraints. 

Interestingly, condensation phenomena occur also for other constraints defined in 
terms of various linear combinations of {ni}. For example, T=J2i n i/ is a plausible 
estimate for the total time that a particle would spend in the system, because n^/ is 
the expected time that each particle spends at site i before being moved, in its turn, to 
the right. Consider now the event T > M-V. The large deviations behavior of this event 
also exhibits two phase transitions, one at V = V min , where J(V) ceases to be identically 
zero and becomes strictly positive, and the other at V — V c , from the non-condensed 
to the condensed phase, with V c depending on the rates in the system. Once again, in 
the condensed phase, particles essentially jam at the site with the smallest exit rate. 

More surprising and interesting is the phase diagram obtained for the joint 
probability of two rare events pertaining to two different linear combinations of {n^}, 
say, Pr{J2i n i > M • U, J2i n %l l^i > M ■ V}, which decays exponentially according to 
exp{— M ■ J(U,V)} for some rate function J(U, V). As we show in the sequel, even 
if the two constraints are physically closely related, the phase diagram of the large 
deviations rate function J(U, V) has a very rich phase diagram with as many as seven 
different phases. We find three distinct types condensed phases: one for each one of the 
individual events and a third one for their combination, which gives rise to the notion of 
a two-dimensional condensation. Furthermore, the phase diagram associated with the 
corresponding equality constraints, J2i n i — MU and J^i^/^i — MV, is dramatically 
different from that of the inequality constraints, with two phases only rather than 
seven. Note, that this two dimensional condensation is very different in nature from 
that considered in the context of two, distinct, conserved quantities [211 1221 [23]. The 
two-constraint problem, in its general form, is the focus of the main part of the paper. In 
the next section, we present a detailed derivation of the results for the single constraint 
problem. 
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3. A Single Constraint 

As mentioned in the Introduction, we begin with a simple single constraint, assuming 
that one has a product distribution of the form of eq. ([1]). Referring to the terminology 
of particles and sites, from the example of Section 2, consider first the probability of 
the event that the number of particles in the system is larger than some threshold, 
J2i n i > M ■ U. The large deviations evaluation of this probability is typically done 
using the Chernoff bound. Specifically, consider the following chain of inequalities: 



z > 1 



-MU j-J J- /' 



exp { —M 



' " - Pi 



Ulnz 




(3) 



where the angular brackets denote an expectation with respect to the distribution of 
eq. (JTJ). The tightest bound, which gives the large-deviations rate function, is obtained 
by minimization of the Chernoff bound over z, or equivalently, by maximization of the 
bracketed expression at the exponent: 

M-l 



J(U) = sup 

Z>1 



U In z — lim — . - , 

M^oo M *H. V 1 - zpi 




(4) 



provided that the limit exists for all z > 1. 

Note that the Chernoff parameter z, that undergoes optimization, is almost 
equivalent to the fugacity z of the grand-canonical ensemble, which controls the expected 
number of particles in the system, and the minimization of the bound is parallel to the 
usual saddle-point evaluation pertaining to the grand partition function. The only 
difference comes about since the Chernoff bound is concerned with the probability of 
the inequality event J2i n % — MU, as opposed to the event J2i n i — MU, which defines 
the canonical ensemble with iV = MU. This implies that one is interested in z > 1 
and when the number of particles is below its average value, J = 0. For rare events 
(with J > 0), the distinction between Pr{X^i > N} and Pr {J2i n i — N} becomes 
meaningless in the limit of large N due to the exponential decay of the probability with 
M. With this analogy, clearly, in the limit of large M (or equivalently N), the bound 
gives an asymptotically exact value of the rate function J (see, e.g., |24j). In other 
words, the calculation of the large deviations probability of a rare event is essentially 
identical to a change of ensembles in traditional statistical physics, with the rate function 
J playing the role of a free energy. An extra, somewhat trivial, phase occurs due to the 
constraint taking the form of an inequality and not an equality of the form J2i n % — N. 
In the latter case, the phase with J = would not exist. With this in mind, what 
follows in the next paragraph is standard. 

As mentioned earlier, in order to proceed from eq. (j3J), we must assume that the 
limit in eq. (j!J) exists. We will assume that there exists a density function g{t) > 0, 
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integrating to unity, such that in the limit of M — > oo, the fraction of {p^} that fall 
between t and t + di, tends to g(t)dt for all t G (0,1). Performing a saddle-point 
approximation on eq. (jHJ) gives the following equation for the optimum choice of z: 

r , r 1 ^ «Pi fPm Z tg(t)dt 
U = hm x ' 



i/,^(iw,r^, (7) 



where p m = maxj^j, and where it is assumed that p m is attained by the same i (say, 
i = without loss of generality) for all Let us denote 

P" 1 *g(*)dt / R v 
U{z) = Z Jo (6) 
We therefore have to solve the equation U = U(z), where the solution z is sought in 
the range [1, l/p m ). Now, in analogy with BEC, if g(p m ) = and lim tTpm g{t)/{p m -t) x 
is positive and finite for some x > 0, then U(l/p m ) < oo, and so, the large deviations 
behavior exhibits a condensation. In other words, as long as U is below the the critical 
density: 

tg(t)dt 

Pn 

there is no condensation, while for U > U c , condensation takes place. This means that 
the large deviations event in question is dominated by realizations for which Uq/N is 
about U — U c > 0, while all other states have negligible relative contributions. Here n 
is the occupation at the site % = 0, corresponding to p m . Denoting by U min = U(l), the 
average particle density in the system, the corresponding large deviations rate function 
is given by: 

[0 u < u min 

J(U) = 1 U\nz- dtg(t) In U mhl < U < U c 

{ ^MiWo^Wln^) U>U C 

so that Pr{X)i^j > N} is of the exponential order of exp{— MJ{U)} and the large 
deviations rate function exhibits three phases: the first is where U is below the average 
value, the second is the non-condensed phase, and the third is the condensed phase. 

The above derivation can be extended quite straightforwardly to deal with more 
general large deviations events, defined in terms of arbitrary linear combinations of {n^}, 
that is, events of the form {(n ,ni, . . . , tim-x) '■ Y.fLo 1 u i n i > M ■ U}, where {ui}^ 1 
are arbitrary deterministic constants. For a meaningful definition of the asymptotic 
regime, one has to define the behavior of the infinite sequence uo, u\, «2, • • ., as was 

done concerning the infinite sequence of parameters Po,Pi,P2, F° r the sake of 

simplicity, we will assume Ui to be a function of p i: i.e., Ui = u(pi), for a certain given 
function u : [0, 1] — > 1R. We are then considering large deviations events of the form 
J2iLo 1 n iu(pi) > MU. In the example discussed in Section 2, u(p) = p/a, so that the 
sum becomes X^q 1 n %l lh — MU. In the example of the ideal Bose gas, where Pi = e~@ e \ 

\ Note that for pi = e~@ ei , this is exactly the classical equation that underlies the BEC. 
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the energy constraint J2i n i e i > MU corresponds to u(p) = — 4lnp[§|. Similarly as in 
the earlier derivation, the saddle-point equation is now given by 



condensation takes place, provided that U c < oo. 
Several comments are now in order: 

(i) It is a simple exercise to show that in two dimensions and above, an ordinary 
black-body would undergo a condensation when a constraint on the total number 
of photons is considered. This is evident by identifying g(t) as the density of states 
of the photons, U as the density of photons in the event considered and Pi = e~ /3ei 
with £j the energy of a photon in mode i. 

(ii) Different constraints can lead to condensates in different places. For example, 
assume that the hopping rates in the model of Section 2 are ordered so that the 
slowest site is at site i = and the fastest is at site i = M — 1. By looking 
at a constraint on U, one obtains a condensation at site i — 0. However, if one 
looks at a constraint on the quantity Q = J2i<M/2(t J 'i ~ / i A//2)^^i, one can obtain a 
condensation at i = M/2 if ip is large enough. 

(iii) In the ordinary BEC, where u(t) = 1, the critical density could be finite only if 
9{Pm) = and lim^ Pm g{t)/{p m — t) x is positive and finite for some \ > 0. In the 
more general case considered now, there are choices for non-negative functions u(t) 
such that U c < oo even if g does not vanish at p m . What counts is the rate at which 
the denominator of the integrand, 1 — tz£ , tends to zero as t — > p m . If 1 — tz^ 
behaves like \t — p m \ x in the neighborhood of p m , for some < x < 1> an d g(t) is 
continuous and finite at t = p m , then U c < oo. This in turn is possible because then 
the corresponding u(t) would behave like log[(l — \t — p m \ x )/t], which is positive in 
the neighborhood of p m . 

Having covered the single constraint problem, we now turn to the more interesting 
case where two constraints are considered simultaneously. Note that the analogy with 
a change of an ensemble is much weaker here. When considering large deviations, there 
is a freedom to choose any combination of constraints, so that in contrast to the usual 
statistical physics, the phase diagrams can have arbitrary dimensions. 

4. Two Constraints 





Having viewed the BEC from a large deviations perspective, it is instructive to further 
extend the scope and consider the joint large deviations behavior of two events or more. 

§ Albeit, in this case, the corresponding constraint does not give rise to condensation. 
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Consider the rate function of two joint events 



{M-l M-l ~> 

£ u ini > MU, ]T >MV\, 

where, once again, for the sake of simplicity, we assume that Ui and depend on 
i only via pi, i.e., = u(pi) and t>j = f(pi) for certain given functions u(-) and 
v(-). We confine ourselves to the case where the functions u(-) and v(-) are non- 
negative. This accommodates the examples discussed earlier in Section 2. Denoting 
X — {X^o 1 u(pi)ni > MU, J2i=o 1 v {Pi) n i > ^^}> an d applying a two-dimensional 
Chernoff bound, we have: 



Pr {X} 



< 



2l > 1, Z 2 > 1 



ME/ -MV 



MC/_-MV 



n 
n 



E 

Hi=0 



I ~Pi 



„ "(Pi) «(P<) 



exp 



-M 



M-l 



U In zi + V In z 2 




Pi 



„ y «(p»)_v(p») 

Pi ^1 ^2 



(8) 



Again, the limitation z% > 1 and 22 > 1 ensures that when we look at events where 
U and V take on values smaller than the expectations lirn/vf_ ) . 00 -k J2i u(Pi) (fk) and 
limM^oo jf J2i v (.Pi) ( n i); respectively, the rate function would vanish. As before, to 
derive the rate function, we maximize the expression in the square brackets (which is 
a saddle-point analysis) by equating its partial derivatives with respect to z\ and 22 
to zero. In the thermodynamic limit, we get the following two equations with the two 
unknowns z\ and z 2 . 



U = U(z 1 ,z 2 ) 



V = V(z u z 2 ) = 



pm tu{t)zl {t) z v 2 {t) g{t)dt 



pm. 



tv(t)z 1 



tz^zf* 

u(t) v(t) 



Zo 



} g(t)dt 



1 - tz 



u{t) v{t) 



(9) 



Zo 



where as before, p m is the maximum of {pi}, which is again assumed to be attained at 
i = for all M. In analogy to usual BEC, Z\ and z 2 are jointly limited by the inequality 



supjtzi 



u(t) v(t) 



z--> 



] < 1, or equivalently, 
sup [u(t) In Z\ + v(t) In z 2 + In t] < 0. 

Q<t<p m 



(10) 



In the sequel, we will refer to the following notation: For a given z±, let <p{z\) be the 
supremum of the values of z 2 that do not violate eq. ( fTOl) . and let A = {(zi, z 2 ) : z\ > 



1, z 2 > 1, z 2 < (f>(zi)}. We now use the eqs. ([HD and (ITU|) to derive the phase diagram 
for the large deviations rate function. For convenience, the final results are summarized 
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towards the end of this section. The phase diagram, shown in Fig. [2J has seven different 
phases. 

Phase 0: not a rare event. The first, trivial, phase occurs when both U and V take 
on values below the expectations, 17(1, 1) and V(l, 1), respectively. This is the region 
where the events are not rare and so, J(U, V) = 0. 

Phase 1: no condensation. This phase is analogous to the non-condensed phase of 
the single event. Here, as long as the pair (U, V) falls in a region for which the equations 



have a solution (z\,z 2 ) G A, then one may substitute this solution into the Chernoff 
bound and obtain the rate function, which in the thermodynamic limit is given by 



This phase is the image (under the transformation defined by the pair of equations 
U = U( zi, Z2), V = V(z\, Z2)) of the set A in the z\-z 2 plane: It is surrounded by three 
curves that connect the points A, B and C in Fig. [2j The curve A—B corresponds to the 
collection of points where z\ = 1, while z 2 varies from 1 (point A) up to its maximum 
allowed value z 2 = (f)(1) = Z 2 (point B). Similarly, the curve A—C corresponds to z 2 = 1 
and Z\ varying from 1 to _1 (1) = Z\. Finally, the curve B-C corresponds to the curve 
z 2 = 4>(zi), where as Z\ increases from 1 to Z\, <j)(zi) decreases from Z 2 to 1. The image 
of the latter curve in the U-V plane will be denoted by V = ^(U). Note that Fig. [2J 
assumes that the curve A-B is above the curve A-C, namely, that U(l,z 2 ) > U(zi, 1) 
implies V(l,z 2 ) > V(z\,l). In the appendix, we prove that this is indeed always the 
case. 

Phase 2: two— dimensional condensation. We now consider the regime above the 
curve V = *&{U). Let us use the short-hand notation for the values that U and V take 
along the curve 



We assume, for the moment, that they are both finite for all Z\ G [1, Z{\ and that 
p m , the achiever of supji^" l/K^i)]^]; ^ s independent of Z\ (see the discussion in 
the sequel). Both these conditions are trivially met, for example, in the model and 
constraints discussed in Section 2. Let (U,V) be a point above the curve V = ^(U). 
The calculation of the rate function is somewhat more involved than the single constraint 
case. To describe it, we need to give values for both z\ and z 2 . To do this, we note that 
in analogy to usual BEC, we have: 



U = U( Zl ,z 2 ); V = V(z 1)Z2 ) 




U(z 1 ) = U(z 1 , ( p(z 1 )) 
V{z l ) = V{z U (j>{z 1 )). 




J_ PmU{p m )Zi Z" 2 
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and similarly, 



V - V( Zl ) 



lim 

M^oo 



1 

M 



PmV(p 



u(p m ) v(p m ) 
mj*l ^2 



1 - PmZ 1 



u{p m ) v(p m ) 



~'9 



As in the ordinary BEC, where a prescription has to be specified for how the 
fugacity approaches the condensation value in the condensed phase, these equations 
essentially give a prescription for taking the values of the fugacities to a point where 
sup t [tei [0(^i)] v ^] = 1) as the thermodynamics limit is taken. Using these, we see that 

V - V(zi) v( Pm ) 



U - U(zi) u(p m ) ' 

This equation specifies, given a point (U,V) above the curve V = ^(U), the choice of 
Zi, which we shall denote by z*, and hence also the choice of Z2, which is z 2 = <f>{z*). 
The large deviations event is dominated by the state corresponding to t — p m . Thus, 
the rate function is given by 



J(U, V) = U In z\ + V In z* 2 - / g(t)dt In 



l-t 



It must be kept in mind, however, that this solution is not applicable to all points 
(U,V) above the curve V = ^{U). To understand the limitation, it is instructive 
to look at the geometric interpretation of the above equation for z\: The expression 
[V — V(zi))/[U — U(zi)) is the slope of the straight line connecting the point (U,V) 
to the point (U(zi),V(zi)) on the curve V = *&(U), and the equation tells us that 
this slope must be equal to v(p m )/u(p m ), which is a given constant. Therefore, this 
solution is applicable only to points (U, V) above the curve V = ^(U) which have the 
following property: the straight line of slope v(p m )/u(p m ) that passes through (U,V) 
must intersect the curve V = ^(U) between points B and C. The set of points with this 
property, which corresponds to the region of two-dimensional condensation is limited 
by the curve V = ^{U) (between B and C) and the two parallel straight lines of slope 
v {Pm)/u(p m ), passing through B and C (see Fig. [2]). 

Phase 3: non— condensed and dominated by the U— constraint. The region below 
the curve A—C (see Fig. [2J) is characterized by z 2 = 1 and z\ > 1. The value of z 2 is fixed 
at unity since we are considering values of V which are below the corresponding average 
value conditioned on the given value of U. This means that there is a non-condensate 
large deviations behavior that is dominated by that of the constraint J2i u {Pi) n i ^ MU 
alone. In other words, the other event, J2iv{Pi) n i — MV, has no impact. The rate 
function is given by minimizing the term in the square brackets in eq. (jHJ) with z% — 1. 
Denoting the obtained value of z\ by z\, the rate function is given by 



J(U,V) = U In zl - 



This phase is bounded on the right by a vertical line (see Fig. |2]), where the constraint 
J2i u (Pi) n i > MU condenses with z 2 = 1. 



rPm 


l-t 


/ dtg{t) In 


1 - *(*?)«(*) 
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Phase 4: condensed and dominated by the [/—constraint. Following the 
reasoning of phase 3, the region below the straight line of slope v(p m )/u(p m ), passing 
via C, is the corresponding condensed phase of this single event J2i u(pi)ni > MU (one- 
dimensional condensation), where the constraint J2i v {Pi) n i > MV has no impact. The 
upper bound on the phase can be inferred by noting that on the line emerging from 
point C in the figure, z 2 — 1- 

The last two phases can be inferred from a symmetry consideration, where the two 
constraints interchange their roles. 

Phase 5: non-condensed and dominated by the ^-constraint. See the 

discussion for phase 3. 

Phase 6: condensed and dominated by the V— constraint. See the discussion for 
phase 4. 

Let us examine now more closely the assumption that U(zi) and V(zi) are both 
finite for a continuum of values of Z\. In the two-dimensional case considered now, this 
issue is more involved than in the one-dimensional case: In the one dimensional case, 
the relevant integral, computed at the maximum allowed value of the fugacity parameter 
z, may be finite if the density g(t) vanishes at t — p m (the achiever of mmpp -1 /™^), and 
tends to zero sufficiently rapidly as t — > p m . By contrast, in the two-dimensional case 
considered now, the achiever of sup t {tzi^[(f)(zi)] v ^}, may depend, in general, on z\, 
and it is inconceivable to expect g(t) to vanish at all these values of t, which may form 
a continuum. (In fact, if g(t) =0 for an interval, then this interval has no contribution 
to the integrals altogether.) Nonetheless, there is a class of special cases where this 
situation does not arise - the cases where the maximizing value of t turns out to be 
independent of z\. For example, if u(t) and v(t) are both monotonically non- decreasing, 
then sup t {tzi^ [<f)(zi)] v ^} is always achieved at t — p m , independent of zi, where now 
p m is again the maximum value of p across the support of the density git)- In this case, 
as in the one-dimensional case, if g(t) — > as t j p m sufficiently rapidly, then U(z 1 ) 
and V(zi) are both finite, and then for large enough U and V, there is a condensation 
at the state corresponding to p m , as explained above. It should be noted, however, that 
the non-decreasing monotonicity of u(t) and v(t) is only a sufficient condition for p m 
to be independent of z±, not a necessary condition. For example, ignoring our previous 
assumption on the positivity of u(t) and v(t), if u(t) = 1 and v(t) = — hit, this is still 
true, although v(t) — — \nt is a decreasing function. 

To summarize, we have identified seven phases in the U-V plane. Denoting 
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v{pq)/u(pq). 



V(1,Z 2 ) 



V(l,l) 




phase 4 (ID): 
condensed - 
U dominates 



U(Z U 1) 



Figure 2. Phase diagram in the U — V plane. Note that each one the points 
A, B and C is the meeting points of four different phases. 



the rate function takes the following behaviors: 



phase 

max 2ljZ2 J(zi, z 2 , U, V) phase 1 

J(zt,(l>(zt),U,V) phase 2 

J(U, V) = { max 2l J(z u 1,U,V) phase 3 

J(Z U 1,U,V) phase 4 

max 22 J{1, z 2 ,U,V) phase 5 

J(1,Z 2 ,U,V) phase 6 

It is interesting to compare this phase diagram with the one which would be obtained 
by considering the equality event 

{M-l M-l ~\ 

J2 «(Pi)ni = MU, Yl v (Pi) n i = MV\. 
i=0 i=0 J 

In this case, the values of both z\ and z 2 would not be restricted to be larger than 1. 
Therefore, all phase transitions associated with either z\ = 1 or z 2 = 1 would disappear 
in this case. It is straightforward to see (similarly to the derivation of phase 1), that 
here we have two phases only: a condensed phase and a non-condensed phase. In the 
Z\-z 2 plane, the set A is no longer limited by the inequalities z\ > 1 and z 2 > 1, but only 
the curve z 2 = (f>(zi), whose image in the U-V plane is now the entire curve V = ^f(U), 
which is no longer limited by the points B and C . The region below this curve is the 
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non-condensed phase and the region above the curve is condensed. The condensation 
is always two-dimensional in character. 

Finally, it would be interesting to demonstrate that in certain situations, the 
condensating state may jump abruptly as we move continuously in the U—V plane. 
In the above discussion we made specific assumptions on the functions g(t), u(t), and 
u(t). In principle, it is possible to extend the calculation to cases where the achiever of 
sup t {tZi {t) [0(z 1 )] u W} takes on any finite number of values as z\ varies between 1 and Z\, 
and and that the density g vanishes (and sufficiently rapidly) at all these values of t. An 
interesting scenario arises, for example, in a variation of the above example, defined by 
the choices u(t) = 1 and v(t) = —a — Int, where < a < — lnp m , and where as before, 
p m is the maximum of t across the support of g(t). In this case, it is easy to see that the 
achiever of swp t {tzi [<f>(zi)] v ®} is given by p m for Z2 < e {z\ > e a ), and by p^, which 
is the minimum of t across the support of g(t), for Z2 > e {z\ < e a ). In other words, 
the condensing state jumps from p m to the other extreme, p^, at the point z\ = e a 
along the curve V = ^{U). In this case, the two-dimensional condensed phase splits 
into three sub-phases. If we denote by D the point corresponding to Z\ = e a along the 
curve V = ^{U), then above this curve, we see three different types of two-dimensional 
condensation (see Fig. [3]): 




Figure 3. Zoom-in on the two-dimensional condensed phase in the example of 
u(t) = 1 and v (t) = —a — Int. 

(i) The region limited by the curve B D and two parallel straight lines with slope 
ln(l/poo) — a, passing through points B and D (phase 2a). 

(ii) The region limited by the curve D-C and two parallel straight lines with slope 
ln(l/p ) — passing through points D and C (phase 2b). 

(iii) The region in between 1 and 2 (phase 2c). The rate function for all points in phase 
2c is the same as in the point D. 
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5. Applications 

Our large deviations analysis focuses on events associated with linear combinations 
pertaining to sequences of independent (but not necessarily identically distributed) 
geometric random variables. Beyond the obvious relevance of this model to the grand- 
canonical ensemble of the ideal boson gas, as was mentioned earlier, there are quite 
a few additional applications, which cover, not only the realm of statistical physics, 
but also that of information engineering models. We mentioned briefly some of these 
applications in the Introduction. In this section, we discuss them in somewhat more 
detail. 

The first application example is that of a one-way Markov chain (a.k.a. left-to-right 
Markov chain, in the literature of speech signal processing) . A one-way Markov chain is 
defined by an ordered set of states (0, 1,2,...), where the only allowed transitions from 
each state i are the self-transition («—►«)- with probability Pi, and a transition to the 
next state (i — > i + 1) - with probability 1 — p^ i = 0, 1, 2, . . . (see Fig. 0J. Clearly, 
every sequence generated by a one-way Markov chain, as defined, is composed of n 
self-transitions of state 0, followed by n\ self-transitions of state 1, followed in turn by 
n2 self-transitions of state 2, and so on, where uq, m, . . . are independent, geometric 
random variables with parameters {pi}- 




Figure 4. State transition diagram for a one-way Markov chain. 

Therefore, it is clear that this model falls within our framework. The one-way 
Markov chain is a very useful model in a variety of application areas of information 
system models. A few examples are hidden Markov modeling of speech signals (see, 
e.g., [22] and references therein), the segmentation of signals, such as those that govern 
the evolution of the fading process of a communication channel (or channels that "heat 
up" [26J), the segmentation of electrocardiographic signals (see, e.g., [57]). beat tracking 
in audio signals (see, e.g., [29J ) , and even handwritten text recognition [27]. 

The interest in the large deviations behavior of linear combinations of {n^} is not 
difficult to justify, in the context of one-way Markov chains. Consider the problem of 
lossless data compression of the sequence of random variables n , ni, . . .. An elementary 
result in Information Theory (see, e.g., [28] ) tells that the optimum code length (in bits) 
of the compressed version of each is given by 

iiim) =- log P(m) = - log[(l - ^>™ ! ] = n i log(l/pi) + log[l/(l-p i )], 

which is an affine function of rij. The large deviations event YU ^( n «) > N = MU is the 
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event that the total code length would exceed the limit of N. If N designates the size of 
a buffer in which the compressed data is stored (in order to monitor the bit rate), then 
this event has the meaning of a buffer overflow, whose consequence is that information 
is lost. We would like, of course, to keep the probability of such an event as small as 
possible. 

Another application where independent geometric random variables naturally arise, 
is in queuing theory. An M/M/l queue (see, e.g., [38]) is a common model of a queue 
according to which the arrivals of customers is a Poisson process of rate A, the service is 
based on the principle of first come - first served (FCFS), and the service time for each 
customer is distributed exponentially with rate \i. As long as A < /i, the queue is stable 
(does not diverge) and the steady-state distribution of the number of customers in the 
queue is geometric with parameter p = A/ //, which is called the utilization of the queue. 
Jackson's theorem [18] extends this to an open network (a.k.a. a Jackson network) of 
M queues, which means that: (i) any external arrival to any given node is a Poisson 
process, (ii) a customer completing service at queue i either joins another queue j with 
probability p^ or leaves the system with probability 1 — J2j Pij > which is non-zero for at 
least one queue, and (iii) all utilization parameters pi are less than 1. Jackson's theorem 
tells that the steady-state joint probability distribution of the queue lengths is given by 
a product of individual geometric distributions with parameters {pi}- A special case of 
a queuing network was considered in Section 2. 

In the context of queuing networks, BEC means that one of the queues, the one 
with the highest utilization, becomes responsible for a bottleneck (or a traffic jam) - a 
linear fraction of the total number of customers spend their time in that queue due to 
the inefficient performance of the server of this queue relative to the arrival rate. When 
applied to queuing networks, our large deviations results mean that we identified BEC 
in an open (Jackson) network and in addition, we have characterized the rate function, 
as well as the phase transitions associated with it. Moreover, since we are allowing 
large deviations events pertaining to arbitrary linear combinations of {n^}, one natural 
application example, as already discussed, is the large deviations behavior of J2i n i/ ' Ik 
(with \ii being the rate through queue no. i), which is a reasonable estimate of the total 
waiting time for a customer who visits all queues. 

There are, of course, other network models that are known to admit a product- 
form steady-state distribution. One of them is the closed-network version of the Jackson 
network, called the Gordon-New ell network |39] . [40] . The only difference between the 
Gordon-Newell network and the Jackson network is that the former is a closed network 
(unlike a Jackson network which is open), i.e., there is no external supply of customers 
and no departures from the system, and so, the total number of customers is fixed. 
The steady-state distribution for the Gordon-Newell network is exactly analogous to 
the canonical Bose-Einstein distribution, and hence it exhibits BEC under certain 
conditions, as was observed already in earlier work, cf. e.g., [36] and [31]. 

The Gordon-Newell theorem appears to be a special case of results concerning 
product forms of steady-state distributions in classes of models, such as the zero- 
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range process (ZRP) (see, e.g., [20], [IZ] and references therein), that are studied in 
the statistical physics literature. According to the ZRP model, particles (customers) 
that lie in an array of sites (a lattice, or more generally, the nodes of a certain graph), 
may hop from one site (queue) to another, and may pile up, according to certain rules 
(see, e.g., the example discussed in Section 2). Jackson's theorem, however, does not 
seem to be directly derivable as a special case since it pertains to an open network. A 
subsequent paper by Jackson [19] allows state-dependent service times and it seems to 
include the ZRP model special case. 
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Appendix 

In this appendix, we prove that 17(1, z 2 ) > U(zi, 1) implies V(l, z 2 ) > V(zi, 1), which 
means that the A—B curve in Fig. [2] lies above the A-C curve. 
Consider the function 



where t is a parameter taking values in the range where the denominator is strictly 
positive. For a given t > 0, this function is clearly monotonically non-decreasing in x. 
Therefore, for all t: 



Since the first bracketed term of the last expression is non-positive (by hypothesis) and 
since ln^ > and lnz 2 > 0, the second bracketed term must be non-negative, which 
proves the argument. 
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Integrating over t, we get: 
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